Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities
نویسندگان
چکیده
In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clustering algorithms such as K-means algorithm. The existing spectral clustering algorithms are all off-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as real time monitoring of the evolving communities of websphere or blogsphere. Unlike traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing items. This paper extends the standard spectral clustering to such evolving data by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigenvalue system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by standard spectral clustering, it achieves similar accuracy but with much lower computational cost. Close inspection into the blog content shows that the incremental approach can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs.
منابع مشابه
Master’s Thesis Pre-Proposal: A Framework for Evolutionary Clustering on Multi-type Relational Data
Rapid development in data acquisition technology has resulted in generating large amount of raw data, providing significant potential for the development of automatic data analysis, classification, and retrieval techniques. Data in many applications such as social networks, blogs, geosciences, and biomedicine, demonstrates an evolving nature. That is, the similarity between the data instances a...
متن کاملPseudo-likelihood methods for community detection in large sparse networks
We consider the problem of community detection in a network, that is, partitioning the nodes into groups that, in some sense, reveal the structure of the network. Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. We present a fast pseudo-likelihood method for fitting the stocha...
متن کاملClustering evolving data using kernel-based methods
Thanks to recent developments of Information Technologies, there is a profusion of available data in a wide range of application domains ranging from science and engineering to biology and business. For this reason, the demand for real-time data processing, mining and analysis is experiencing an explosive growth in recent years. Since labels are usually not available and in general a full under...
متن کاملEfficient eigen-updating for spectral graph clustering
Partitioning a graph into groups of vertices such that those within each group are more densely connected than vertices assigned to different groups, known as graph clustering, is often used to gain insight into the organisation of large scale networks and for visualisation purposes. Whereas a large number of dedicated techniques have been recently proposed for static graphs, the design of onli...
متن کاملIncremental kernel spectral clustering for online learning of non-stationary data
In this work a new model for online clustering named Incremental Kernel Spectral Clustering (IKSC) is presented. It is based on Kernel Spectral Clustering (KSC), a model designed in the Least Squares Support Vector Machines (LS-SVMs) framework, with primal-dual setting. The IKSC model is developed to quickly adapt itself to a changing environment, in order to learn evolving clusters with high a...
متن کامل